Euclidean Bus Mobility and Route Optimization, A Comparison
Routes in New York City, NY
Author
Alan Vlakancic
Published
November 30, 2025
Introduction:
This project uses stplanr transport modeling package to design an optimal transport route for bus or cycle routes in New York City. Stplanr is a transport planning visualization R package that can be used to plan transit networks, in addition to transit planning elements such as identifying transit catchment areas, origin/destination data and ride frequency visualizations, among others. In their white paper, the devlopers of stplanr call for an accountable, transparent and democratized transit planning system that doesn’t rely on proprietary and often vastly different data sources and data processing softwares. Although the package can visualize a whole host of data, this project will focus on comparing direct desire lines or “Euclidean” routes (as the crow flies), existing bus networks and stplanr’s optimized routes. To wit: this can map the efficiency of the bus routes compared to the most direct route possible if there were no built environment factors in the way.
Methods:
To adequately compare desire lines, bus routes, and the most efficient routes with the current street network, this project will require at minimum three data sources. Each of these will be sourced separately and overlaid onto each other:
Data Source: leaflet data for basemap
Methods: This can be sourced directly into R by installing the leaflet package. It is an interactive map so the user can navigate it.
Data Source: NYC Open Data for Bus Shelter locations
Despite significant searching, there is no comprehensive bus stop dataset, so the project will focus on bus stop shelters, which are mapped via NYC Open Data. I used Bus Shelters as there would be thousands upon thousands of bus stops in NYC, and this would be too computationally intensive to process.
Methods: This is brought into R as a CSV file. Each bus stop shelter has longitude and latitude coordinates that align with leaflet and OpenStreetMap projections.
Data Source: NYC Open Data for NYC Borough Boundaries
Methods: This is brought into R as a shapefile. This will provide the basemap boundary for NYC to ensure all data is within the city limits, represented on the leaflet map.
Load the necessary R packages for spatial data manipulation and visualization (e.g., ggmap, dplyr, stplanr, osmdata, sf, leaflet).
Import the NYC basemap shapefile and bus shelter CSV data into R.
Convert the bus shelter data into an sf object with appropriate coordinate reference system (CRS).
Terms:
Desire Lines & “Euclidean”: Straight lines connecting origin and destination points, representing the most direct path between them.
OSRM: Open Street Routing Machine, a routing engine that uses Open Street Map data to calculate routes, shortest routes, travel times, and can be used to make travel time maps, distance routing for car, bike and walking.
#SHAPEFILES AND MAPbbox <-c(left =-73.96, bottom =40.54, right =-73.70, top =40.81)#create bounding box for NYCnyc_map <-"data/"##nyc basemap, downloaded from nyc open data. source: https://search.r-project.org/CRAN/refmans/ptools/html/nyc_bor.htmlnyc_sf <-st_read(nyc_map, quiet =TRUE)#bring in nyc_map as a sfshelters_sf <-read_csv("data/Bus_Stop_Shelter_20251020.csv")#NOTE: REPLACE WITH DATA WHEN USING QUARTO!#this brings in the bus stop shelter information. source: https://data.cityofnewyork.us/Transportation/Bus-Stop-Shelters/qafz-7myzshelters_sf_fix <-st_as_sf(shelters_sf, coords =c("Longitude","Latitude"), crs =4326)#convert to sf object with the correct coordinate reference systemosmdata::set_overpass_url("https://overpass-api.de/api/interpreter")#set overpass url for open street maps, finds specific data package, the below query will use thisosm_data <-opq(bbox = bbox) %>%add_osm_feature(key ="highway", value =c("primary","secondary")) %>%osmdata_sf()#import data for primary and secondary highways from open street maps#opq is overpass query, which is basically where you want to look
Show code
# STPLANR FUNCTIONSshelters_sf_fix <- shelters_sf_fix %>%mutate(id =paste0("S", row_number()))#add ID column for origin-destination pairs so they have a corresponding numberflow_all <-expand.grid(o = shelters_sf_fix$id, #create origind = shelters_sf_fix$id, #create destinationstringsAsFactors =FALSE) %>%#make sure they aren't factorsfilter(o != d) %>%# this remove self-pairs so O is not Dmutate(trips =1) %>%#add trip count of 1 for each pairsample_n(50) #sample 50 random paris to avoid blowing up the computerdesire_lines_all <-od2line(flow_all, zones = shelters_sf_fix, zone_code ="id") #use od2line function to create desire lines (euclidean) for all pairsshelter_coords <- shelters_sf_fix %>%st_coordinates() %>%as.data.frame() %>%bind_cols(id = shelters_sf_fix$id)#extract coordinates and bind with ID columnroute_single <-function(o_id, d_id) { #function to create a single route between origin and destination o <- shelter_coords %>%filter(id == o_id) #filter to get origin coordinates d <- shelter_coords %>%filter(id == d_id)#filter to get destination coordinates r <-try(route_osrm(from =c(o$X, o$Y),to =c(d$X, d$Y)), silent =TRUE)#use try to catch errors (e.g., no route found)if (inherits(r, "try-error")) return(NULL)#if route found, return the routereturn(r)}routes_list <- purrr::map2(flow_all$o, flow_all$d, route_single)#create routes for all origin-destination pairs using the route_single functionroutes_list <- routes_list[!sapply(routes_list, is.null)]#remove any NULL results (failed routes)routes_sf <-do.call(rbind, routes_list)#combine all routes into a single sf object
Show code
#ROUTE & DESIRE LENGTH CALCULATION, MEAN & PCT CHANGEroutes_projection <-st_transform(routes_sf, 32618)desire_projection <-st_transform(desire_lines_all, 32618)#ensures the correct, projected shapefile for computation not mappingroute_length <-st_length(routes_projection)desire_length <-st_length(desire_projection)#compute lengthsroute_length <-as.numeric(route_length)desire_length <-as.numeric(desire_length)#convert lengths to numeric valueslengths_tbl <-tibble(route_m = route_length,desire_m = desire_length,origin = flow_all$o,destination = flow_all$d)#create tibble to compare lengths in the final map w/ IDslengths_tbl_print <-tibble(route_km = route_length /1000,desire_km = desire_length /1000,origin = flow_all$o,destination = flow_all$d) %>%mutate(ID =row_number(),difference =round(route_km - desire_km) ) %>%mutate(route_km =comma(round(route_km)),desire_km =comma(round(desire_km)) ) %>%arrange(desc(difference))#tidy data for later printing in a kablelengths_tbl_print <- lengths_tbl_print %>%select(ID, origin, destination, route_km, desire_km, difference) mean_route <-mean(route_length, na.rm =TRUE)mean_desire <-mean(desire_length, na.rm =TRUE)#calculate means for both route and desire lengthspercent_change <- ((mean_route - mean_desire) / mean_desire) *100#calculate percent change mean_lengths <-data.frame(type =c("Route Length", "Desire Line Length"),mean_length_m =c(round(mean_route), round(mean_desire)))#put these into a data frame, rounded to whole numbersmean_lengths <- mean_lengths %>%mutate(mean_length_km = mean_length_m /1000,percent_change =c(percent_change, NA) )#add km conversion and percent change to the data frame, i converted to KM for ease of computation (ie: dividing by 1,000)mean_lengths_print <- mean_lengths %>%mutate(mean_length_km =comma(round(mean_length_m /1000)),percent_change =comma(round(percent_change)) )#tidy data for later printing in a kable
Show code
#LEAFLET PREPnyc_leaflet <-st_transform(nyc_sf, 4326)roads_leaflet <-st_transform(osm_data$osm_lines, 4326)desire_leaflet <-st_transform(desire_lines_all, 4326)routes_leaflet <-st_transform(routes_sf, 4326)#transform all data to WGS84 for leaflet mappingdesire_leaflet_popup <-paste0("<b>Desire Line</b><br/>","Origin: ", desire_leaflet$o, "<br/>","Destination: ", desire_leaflet$d, "<br/>","Desire Line Distance: ", round(lengths_tbl$desire_m /1000), " km")#create popup info for desire lines for interactive maproutes_leaflet_popup <-paste0("<b> Route</b><br/>","Origin: ", desire_leaflet$o, "<br/>","Destination: ", desire_leaflet$d, "<br/>","Route Distance: ", round(lengths_tbl$route_m /1000), " km<br/>")#create popup info for routes for interactive mappal_desire <-colorNumeric(palette ="viridis",domain = lengths_tbl$desire_m)#create color palette for desire lines based on distancepal_routes <-colorNumeric(palette ="inferno",domain = lengths_tbl$route_m)#create color palette for routes based on distanceselected_ids <-unique(c(flow_all$o, flow_all$d))#get unique IDs of sampled sheltersselected_shelters <- shelters_sf_fix %>%filter(id %in% selected_ids)#filter shelters to only those that were sampled
# Mean Lengths Tablemean_lengths_print %>%kable(col.names =c("Type", "Mean Length (m)", "Mean Length (km)", "Percent Change (%)"),caption ="Mean Lengths of Routes vs Desire Lines") %>%kable_styling(full_width =FALSE, position ="left")
Mean Lengths of Routes vs Desire Lines
Type
Mean Length (m)
Mean Length (km)
Percent Change (%)
Route Length
18654
19
19
Desire Line Length
15702
16
NA
Show code
# Lengths Comparison Tablelengths_tbl_print %>%select(-ID) %>%kable(col.names =c("Origin", "Destination", "Route Length (km)", "Desire Line Length (km)", "Difference (km)"),caption ="Comparison of Route Lengths and Desire Line Lengths, By Difference") %>%kable_styling(full_width =FALSE, position ="center")
Comparison of Route Lengths and Desire Line Lengths, By Difference
Origin
Destination
Route Length (km)
Desire Line Length (km)
Difference (km)
S2388
S932
23
13
10
S2506
S3256
46
37
8
S2240
S1418
19
11
8
S1428
S2297
19
12
7
S671
S1422
25
19
6
S1334
S49
27
21
6
S2280
S3274
34
29
5
S1527
S2996
24
19
5
S3265
S2897
36
31
5
S1223
S2347
18
14
4
S2700
S3268
30
26
4
S1641
S2960
26
22
4
S690
S1163
27
23
4
S519
S1001
34
30
4
S1032
S359
16
13
3
S420
S1684
25
22
3
S492
S2482
23
20
3
S1260
S745
19
17
3
S1181
S606
21
18
3
S1306
S744
29
26
3
S257
S1084
19
16
3
S482
S2318
22
19
3
S2001
S1186
20
17
3
S1906
S3093
14
12
3
S3100
S1661
20
17
3
S2472
S1817
19
17
2
S1317
S1092
10
9
2
S875
S1606
24
22
2
S259
S2619
23
21
2
S1966
S226
15
13
2
S2243
S1556
12
10
2
S1599
S2602
19
17
2
S3277
S2047
38
36
2
S1043
S1902
12
9
2
S15
S2836
8
6
2
S2170
S1176
11
9
2
S2316
S3139
10
8
2
S2541
S2675
7
5
1
S61
S2263
12
12
1
S336
S368
14
13
1
S2373
S1675
16
15
1
S3350
S1490
17
15
1
S120
S596
4
3
1
S709
S1603
8
7
1
S2348
S2655
13
11
1
S1847
S2159
13
12
1
S268
S465
4
4
0
S80
S49
3
3
0
S1213
S1179
3
3
0
S1331
S1066
5
4
0
Figures:
Show code
gg1 <-ggplot(lengths_tbl_print, aes(x = ID, y = difference, text =paste("Origin: ", origin, "<br>Destination: ", destination))) +geom_bar(stat ="identity") +scale_fill_viridis_b()+labs(title ="Difference Between Route Length and Desire Line Length",x ="Route ID",y ="Difference (km)" ) +theme_dark()gg1_plotly <-ggplotly(gg1)gg1_plotly
Results:
The mean route length for the optimized routes for this particular sample run is 18.65 km, while the mean Euclidean desire line length is 15.7 km. This represents a percent change of 18.8% longer for the optimized routes compared to the direct desire lines. The interactive map above visualizes these routes, with desire lines colored based on their lengths and Open Street Routing Machine (OSRM) routes similarly colored with a different theme.
Discussion:
The results indicate that the optimized OSRM routes are significantly longer than the direct desire lines, which is generally expected given the constraints of the built environment and road network. The percent change of 18.8% suggests that while the desire lines represent the most direct path between two points, real-world travel must navigate around obstacles, follow roadways, and adhere to traffic regulations etc.
Limitations:
This does not represent all bus stops in NYC, just shelters. Although the exact number of bus stops is difficult to find, the MTA states that there are 327 bus routes in the five boroughs and countless stops in between. To make the data manageable both in computation and visualization, this study only selects 50 at random. This limits the amount of data points and does not fully capture the bus network.
The OSRM routing service may not always find a route between two points, especially if they are very close together or in areas with limited road connectivity. The code removes these unroutable routes, and they are not shown in the data.
The analysis does not account for real-world factors such as traffic, hazards, closures, road conditions, bus “bunching” or transit schedules, which can significantly impact actual travel times and route efficiency.The analysis assumes that the shortest path is the most efficient, which may not always be the case in real-world scenarios.
The sample size of 50 origin-destination pairs is relatively small and is not be representative of the entire bus network in NYC.
Only “Primary” and “Secondary” roads are sampled here, as the computation for the smaller roads (tertiary etc.) was too processing heavy. This eliminates a large selection of routes.
Future:
Future research could expand the sample size to include more origin-destination pairs, or even all bus stops. Incorporating real-world travel time data, traffic patterns, and transit schedules could provide a more comprehensive understanding of route efficiency. Further analysis could also explore the impact of different modes of transportation, such as cycling or walking, on route optimization and efficiency.